Automatically Extracting and Representing Collocations for Language Generation
نویسندگان
چکیده
Collocational knowledge is necessary for language generation. The problem is that collocations come in a large variety of forms. They can involve two, three or more words, these words can be of different syntactic categories and they can be involved in more or less rigid ways. This leads to two main difficulties: collocational knowledge has to be acquired and it must be represented flexibly so that it can be used for language generation. We address both problems in this paper, focusing on the acquisition problem. We describe a program, Xtract, that automatically acquires a range of collocations from large textual corpora and we describe how they can be represented in a flexible lexicon using a unification based formalism.
منابع مشابه
Retrieving Collocations by Co-occurrences and Word Order Constraints
In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method retrieve collocations in the following stages: 1) extracting strings of characters as units of collocations 2) extracting recurrent combinations of strings in accordance with their word order in a corpus as collocations. Through the method, various range of collocations, especially...
متن کاملExtracting Collocations from Text Corpora
A collocation is a habitual word combination. Collocational knowledge is essential for many tasks in natural language processing. We present a method for extracting collocations from text corpora. By comparison with the SUSANNE corpus, we show that both high precision and broad coverage can be achieved with our method. Finally, we describe an application of the automatically extracted collocati...
متن کاملExtracting Arabic Collocations Based on Jape Rules
The massive amount of digital information available in all disciplines has generated a critical need to organize and structure their content. Among the existing tools for languages such as English or French can easily be adapted to Arabic language. In some cases a simple configuration is sufficient while in other cases significant modifications must be made to obtain acceptable results. We pres...
متن کاملDiscovering Collocations in Modern Greek Language
In this paper two statistical methods for extracting collocations from text corpora written in Modern Greek are described, the mean and variance method and a method based on the X test. The mean and variance method calculates distances (“offsets”) between words in a corpus and looks for specific patterns of distance. The X test is combined with the formulation of a null hypothesis H0 for a samp...
متن کاملCollocation and Trillocation
In this paper we proposed that the neglected three words collocations (trillocation) should be emphasized in collocation study. From the point of view of colligations, more useful collocations could be covered by adding a third category. For a specific third word, it will help avoid the unnaturalness of a two words collocation. A statistic based automatic trillocation extracting system is propo...
متن کامل